In this report, we analyze the log cpm expression for 24 selected genes in 870 samples from 54 people. First we show the mean (and 95% confidence of mean) for the log2 cpm expression for each gene across the sun protected (SP), sun damaged (SD), AK edge (AKEdge) and AK center (AKCenter) and cSCC regions in the epidermis and dermis tissues separately. We the then look at the pairwise comparisons of each of those reasons using a random intercept model. We plot the pairwise comparison for genes that were differentially expressed in at least one region. Next we look at the log2 cpm trend from SP to AK center using generalized estimating equations. We found HLA-TPB1 had an up regulated trend from SP to AK center and plot the population average trend.. Next we Differential expression for epi SD, AK edge and AK center vs cSCC as well as derm SD, AK edge and AK center vss cSCC stroma. We found several genes that were up or down regulated in cSCC samples compared to the three other regions. Finally, we highlight the results in an interactive heatmap.
Code
# Get packages packages <-c("NanoStringNCTools","GeomxTools","GeoMxWorkflows","knitr","kableExtra","magrittr","ggplot2","plotly","patchwork","data.table","geepack", "nlme","emmeans")for (pkg in packages) {# If package is not installed then install it. if(!(pkg %in%installed.packages()[,1])){install.packages(pkg)}# Load package. library(pkg,character.only =TRUE)}for (i indir("R")) {source(paste0("R/",i))}
Data loading
Here we are loading the preprocessed data. This data is stored at Processed/StandardizedNanostringData.rds and contains the raw reads and the log2 counts per million gene expression data. Additionally, it contains the phenotypic information (AOI, ROI, SubjectID) and quality control metrics for each sample. We create an analysis data csv file which will contain the phenotype information, AOI, ROI, SubjectID and the log2 CPM for each gene.
Note: For subjectIDs 200016 and 200018, the Ak edge and AK center sample were ran on plate 5 and then reran on plate 14. We will be using the samples from plate 14 and therefore removing AK edge and AK center samples for these subjectIDs on plate 5.
Note: We have two subjects with matched SP/SD/AK samples with cSCC. The cSCC samples have different subjectIDs, therefore, we are changing the cSCC subjectIDs for these two participants to match the subjectIDs used for the SP/SD/AK samples.
SP through AK Subject ID
cSCC Subject ID
300103
101123
300108
101152
Code
###################################################################### This chunk is intendend to create the geneAnalysisData.csv from the StandardizedNanostringData.rds## Nanostring object that contains the phenotype information, gene annotations, ## and the Raw reads and the CPM standardized reads for each sample. #### This chunk is not nessisary to run if the GeneAnalysisData.csv has## already been created.########################################################################################################### Load Normalized Data####################################### This is a link to the processed data on cyversedata_path = data_path ="https://data.cyverse.org/dav-anon/iplant/projects/SkinPPG/H202SC24094550/Nanostring/Processed/StandardizedNanostringData.rds"# You may need to edit this code to update pathnormalizedData <-readRDS(url(data_path))############################################ Pull use the log2(CPM) expression data########################################### Get the assay data for the modeling CPM assay <-data.table(t(assayDataElement(normalizedData, elt ="CPM")), keep.rownames =TRUE) # Check 2^x for each row should give us 1 million#table(round(rowSums(2^assay[,-'rn']),6))#dim(assay)############################################ Pull meta data nanostring data and add the ##. Ordinal region variable##. SP =0##. SD = 1##. AK edge = 2##. AK center = 3##########################################metaData <-data.table(normalizedData@phenoData@data[,c("SubjectID","AOIInfo","Region","Roi","Plate")], keep.rownames =TRUE)# Add ordinal variablemetaData[,RegionOrdinal :=fcase( Region =="SunProtected", 0, Region =="SunDamaged",1, Region =="AKEdge", 2, Region =="AKCenter",3)]# Compare plate 5 and 14 Subjects 200016 and 200018 had samples re#plate5 <- metaData[Plate=="Skin5"]#plate14 <- metaData[Plate=="Skin14"]#table(plate5$SubjectID)#table(plate14$SubjectID)# SubjectID 200016 had AK edge and AK center reran#subject16 <- metaData[SubjectID==200016]#table(subject16$Plate,subject16$Region,subject16$AOIInfo)#subject18 <- metaData[SubjectID==200018]#table(subject18$Plate,subject18$Region,subject18$AOIInfo)#table(metaData$Region, metaData$RegionOrdinal)#class(metaData$RegionOrdinal)######################################################### For subjectIDs 200016 and 200018, regions AKEdge ## and AK Center were ran on plate 5 and then again##. on plate 14. This means the samples were## ran twice. We will remove the duplicated samples## From plate 5 from the down stream analysis######################################################### Get the sample names for the duplicated samplesduplicatedSamps <- metaData[(SubjectID %in%c(200016,200018)) &grepl("AK",Region) & Plate=="Skin5"]# Subset the phenotype data to exclude the duplicated samplesmetaData <- metaData[!(rn %in% duplicatedSamps$rn)]##################################################################. There are 3 subjects with matched SP and cSCC samples##. However, the subject IDs are different. We will convert##. The cSCC subjectIDs to match the SP IDs. See note above.#### | SP through AK Subject ID | cSCC Subject ID |# |--------------------------|-----------------|# | 300103 | 101123 |# | 300108 | 101152 |############################################################### Make sure all of the ids are in the meta dataids <-c("300103","300108","101123","101152")metaData[, SubjectIDMatched :=ifelse(SubjectID=="101123","300103",ifelse(SubjectID=="101152","300108", SubjectID))]# metaData[grepl("300103",SubjectIDMatched),.(SubjectID,SubjectIDMatched,AOIInfo)]# metaData[grepl("300108",SubjectIDMatched),.(SubjectID,SubjectIDMatched,AOIInfo)]############################################## Merge the assay data and metaData into ## one analysis dataset############################################AnalysisData <-merge(metaData,assay,by='rn')# Change the rn column to sample_IDsetnames(AnalysisData, old ='rn', new ="SampleID")################################################ Create sample distribution table## We are splitting the tables up to make the tables## easier to read in the html file. ################################################ Gets the number of samples for each participant grouped by Region and AOIdist <- AnalysisData[,.(N= .N), by=.(SubjectIDMatched,Region,AOIInfo) ][,RegionAOI :=paste0(Region," (",AOIInfo,")")]# filter for Non cSCC samplesoptions(knitr.kable.NA ='')tableNoncSCC <-dcast.data.table(SubjectIDMatched~RegionAOI, value.var ="N", data=dist[AOIInfo!="cSCC"])# Add row totalstableNoncSCC[,Total:=rowSums(.SD,na.rm = T), .SDcols = is.numeric]# Generate tablerbind(tableNoncSCC,as.list(c("Total",colSums(tableNoncSCC[,-1], na.rm = T))),use.names=F)%>%kbl(caption ="Sample distribution for non cSCC samples") %>%kable_classic() %>%scroll_box(width ="100%", height ="400px")
Sample distribution for non cSCC samples
SubjectIDMatched
AKCenter (Derm)
AKCenter (Epi)
AKEdge (Derm)
AKEdge (Epi)
SunDamaged (Derm)
SunDamaged (Epi)
SunProtected (Derm)
SunProtected (Epi)
Total
200003
3
3
3
3
12
200009
3
3
3
2
2
3
3
2
21
200011
3
3
3
3
1
3
2
18
200016
3
3
3
3
3
3
3
3
24
200018
3
3
3
3
3
3
3
3
24
200020
6
6
6
6
3
3
3
3
36
200029
6
6
6
6
3
3
3
3
36
20003
3
3
6
200034
3
3
3
3
3
3
3
3
24
200059
2
3
3
3
3
3
3
3
23
200067
6
6
5
6
3
3
3
3
35
200069
3
3
3
3
12
200073
6
6
6
6
3
3
3
3
36
200074
3
3
3
3
3
3
3
3
24
200083
3
3
3
3
3
3
3
3
24
200095
6
6
5
5
3
3
3
3
34
20075
3
3
3
3
3
3
3
3
24
20082
3
3
3
3
3
3
3
3
24
20096
6
6
6
6
3
3
3
3
36
300103
3
3
3
3
3
3
3
3
24
300105
6
6
6
6
3
3
3
3
36
300108
3
3
2
3
3
3
3
3
23
300110
2
3
3
3
2
3
3
3
22
300112
6
6
6
6
3
3
3
3
36
Total
88
93
87
91
61
64
66
64
614
Code
# Show cSCC samplestablecSCC<-dcast.data.table(SubjectIDMatched~RegionAOI, value.var ="N", data=dist[AOIInfo=="cSCC"])# Add row totalstablecSCC[,Total:=rowSums(.SD,na.rm = T), .SDcols = is.numeric]rbind(tablecSCC,as.list(c("Total",colSums(tablecSCC[,-1], na.rm = T))),use.names=F)%>%kbl(caption ="Sample distribution for non cSCC samples") %>%kable_classic() %>%scroll_box(width ="100%", height ="400px")
Sample distribution for non cSCC samples
SubjectIDMatched
cSCC_Center (cSCC)
cSCC_Edge (cSCC)
cSCC_Stroma (cSCC)
cSCC_Tumor (cSCC)
Total
100026
3
3
3
9
100244
3
3
6
100264
3
3
6
10067
3
3
3
9
100925
3
3
6
100931
3
3
3
9
100945
3
3
2
8
101026
3
3
1
7
101035
3
3
3
9
101036
3
3
3
9
101037
3
3
3
9
101040
2
2
101041
3
3
3
9
101042
3
3
3
9
101065
3
3
3
9
101066
3
3
3
9
101067
3
3
3
9
101073
3
3
3
9
101091
1
3
3
7
101096
3
3
2
8
101099
3
3
3
9
101113
3
3
3
9
101126
3
3
3
9
101127
3
3
3
9
101129
3
3
3
9
101130
3
2
5
101134
3
3
6
101138
3
3
3
9
101146
3
3
3
9
101149
3
3
6
300103
3
3
3
9
300108
3
3
3
9
Total
73
75
91
17
256
Code
#names(AnalysisData)[1:10]############################################################################### Save analysis data###############################################################################saveRDS(AnalysisData , "GeneAnalysisData.RDS")
Means Accross ROI
For selected genes, we will be plotting the mean and 95% confidence intervals across ROI’s for each ROI. We are creating
There are interesting patterns of expression for genes CD274, CD28, CD80, and CD86 for epidermis skin. Additionally, genes CIITA, HLA-DMA, HLA-DPB1, and HLA-DRB1 in the dermis skin.
Code
# Read in analysis dataanalysis <-readRDS("GeneAnalysisData.RDS")setDT(analysis) # turn data into a data table object################################################################### Create gene list##################################################################genes<-c("CD274","CD28","CD80","CD86","CIITA","CTLA4","HAVCR2","HLA-DMA","HLA-DOA","HLA-DOB","HLA-DPB1","HLA-DRB1","LAG3","LGALS1","LGALS3","LGALS4","PDCD1","PDCD1LG2", "PVR", "TIGIT", "TNFRSF18", "TNFRSF9", "TNFSF18", "TNFSF9" )############################################################### Make analysis data long so the genes are in the rows#############################################################cols <-c("SubjectID","AOIInfo","Region","SubjectIDMatched",genes)analysisLong <-melt.data.table(analysis[,.SD,.SDcols = cols],id.vars =c("SubjectID","AOIInfo","Region","SubjectIDMatched"),variable.name ="genes",value.name ="exp")################################################################################# Split data into Epi and Derm######. With cSCC Stroma in the Derm strata and cSCC Tumor, edge and center in the Epi strata. ################################################################################epiDat <- analysisLong[AOIInfo =="Epi"| (Region %in%c("cSCC_Edge","cSCC_Center","cSCC_Tumor"))][,Region :=ifelse( Region %in%c("cSCC_Edge","cSCC_Center","cSCC_Tumor"), "cSCC", Region)]# Check#table(epiDat$AOIInfo)#table(epiDat$Region)# DermisdermDat <- analysisLong[AOIInfo =="Derm"| (Region =="cSCC_Stroma")]#table(dermDat$AOIInfo)#table(dermDat$Region)################################################################################## Create line plots for each strata without SP###############################################################################epiPlot <-geneMeans(epiDat)# Make region an ordered factorepiPlot[,Region:=factor(Region,levels =c("SunProtected","SunDamaged","AKEdge","AKCenter","cSCC"))]dermPlot <-geneMeans(dermDat)# Make region an ordered factordermPlot[,Region:=factor(Region,levels =c("SunProtected","SunDamaged","AKEdge","AKCenter","cSCC_Stroma"))]# Check#epiPlot#dermPlot###################################### Epi####################################genes12 <- genes[1:12]# Epi plotggplot(epiPlot[genes %in% genes12 & Region !="SunProtected"], aes(x=as.numeric(Region),y=M)) +geom_point() +geom_line() +geom_errorbar(aes(ymin = lower, ymax=upper),width=0.25)+facet_wrap(~genes, scales ="free_y")+scale_x_continuous(breaks =c(2,3,4,5),labels =c("SD","AK-Edge","AK-Center","cSCC")) +theme_bw() +ylab("log2(CPM)") +xlab("Region") +ggtitle("Means and 95% confidence intervals accross regions","Epi set 1") +theme(axis.text.x =element_text(angle =90))
Here we are looking at the pairwise comparisons for the genes above. We use a random intercept mixed model to account for the repeated samples within a person. We first test if any of the regions were differentially expressed using analysis of variance. We then plot the pairwise comparisons for each gene showing at least one region was differentially expressed.
Code
########################################################### Here we are going to look at all pairwise comparisons ## for the plots above. ############################################################ Read in analysis dataanalysis <-readRDS("GeneAnalysisData.RDS")setDT(analysis) # turn data into a data table object################################################################### Create gene list##################################################################genes<-c("CD274","CD28","CD80","CD86","CIITA","CTLA4","HAVCR2","HLA-DMA","HLA-DOA","HLA-DOB","HLA-DPB1","HLA-DRB1","LAG3","LGALS1","LGALS3","LGALS4","PDCD1","PDCD1LG2", "PVR", "TIGIT", "TNFRSF18", "TNFRSF9", "TNFSF18", "TNFSF9" )############################################################### Make analysis data long so the genes are in the rows#############################################################cols <-c("SubjectID","AOIInfo","Region","SubjectIDMatched",genes)analysisLong <-melt.data.table(analysis[,.SD,.SDcols = cols],id.vars =c("SubjectID","AOIInfo","Region","SubjectIDMatched"),variable.name ="genes",value.name ="exp")################################################################################# Split data into Epi and Derm######. With cSCC Stroma in the Derm strata and cSCC Tumor, edge and center in the Epi strata. ################################################################################epiDat <- analysisLong[AOIInfo =="Epi"| (Region %in%c("cSCC_Edge","cSCC_Center","cSCC_Tumor"))][,Region :=ifelse( Region %in%c("cSCC_Edge","cSCC_Center","cSCC_Tumor"), "cSCC", Region) ][Region!="SunProtected",][,Region :=factor(Region,levels =c("SunDamaged","AKEdge","AKCenter","cSCC"))]# Check#table(epiDat$AOIInfo)#table(epiDat$Region)# DermisdermDat <- analysisLong[AOIInfo =="Derm"| (Region =="cSCC_Stroma")][Region!="SunProtected",][ , Region :=factor(Region,levels =c("SunDamaged","AKEdge","AKCenter","cSCC_Stroma"))]#table(dermDat$AOIInfo)#table(dermDat$Region)################################################################################# Get pairwise comparisons for Epi and Derm. Code in the R folder.###############################################################################EpiPairs <-paircomps(epiDat)# Order the contrasts EpiPairs[,contrast :=factor(contrast,levels =c("SunDamaged - AKEdge","SunDamaged - AKCenter","SunDamaged - cSCC","AKEdge - AKCenter","AKEdge - cSCC","AKCenter - cSCC"))]#table(EpiPairs$contrast)DermPairs <-paircomps(dermDat)DermPairs[,contrast :=factor(contrast,levels =c("SunDamaged - AKEdge","SunDamaged - AKCenter","SunDamaged - cSCC_Stroma","AKEdge - AKCenter","AKEdge - cSCC_Stroma","AKCenter - cSCC_Stroma"))]#table(DermPairs$contrast)################################################################################# Plot results################################################################################ Significant Epi genessigEpi <- EpiPairs[OverallTest<0.05,][,Sig :=fcase(0>= Lower95Conf &0<= Upper95Conf, "",default ="**")]ggplot(sigEpi, aes(x=contrast,y=Est)) +geom_point() +geom_errorbar(aes(ymin = Lower95Conf,ymax = Upper95Conf), width=0.25) +facet_wrap(~genes,scale="free_y") +geom_text(aes(x = contrast, y=Upper95Conf+0.025, label = Sig)) +theme_bw() +theme(axis.text.x =element_text(angle=90)) +ggtitle("Genes with at least one significant comparison.","Epidermis") +ylab("Estimate") +xlab("")+labs(caption ="** represents a significant comparison. A random intercept model was fit to account for the within person correlation Laird and Ware (1982)")
Code
# Significant Derm GenessigDerm <- DermPairs[OverallTest<0.05,][,Sig :=fcase(0>= Lower95Conf &0<= Upper95Conf, "",default ="**")]ggplot(sigDerm, aes(x=contrast,y=Est)) +geom_point() +geom_errorbar(aes(ymin = Lower95Conf,ymax = Upper95Conf), width=0.25) +facet_wrap(~genes,scale="free_y") +geom_text(aes(x = contrast, y=Upper95Conf+0.025, label = Sig)) +theme_bw() +theme(axis.text.x =element_text(angle=90)) +ggtitle("Genes with at least one significant comparison.","Dermis") +ylab("Estimate") +xlab("") +labs(caption ="** represents a significant comparison. A random intercept model was fit to account for the within person correlation Laird and Ware (1982)")
Code
################################################################################## Save results################################################################################write.csv(EpiPairs,"EpiPairwiseComparisons.csv")write.csv(DermPairs,"DermPairwiseComparisons.csv")
Trajectory Analysis
The goal of this analysis is to identify genes with either a positive or negative trend from sun-protected to AK-Center. We assign a numeric value to each region (SP=0, SD=1, AKEdge=2, AKCenter=3). We use generalized estimating equations to estimate the population average trend for each gene from SP to AKCenter. For genes with a significant trend we plot the population average trend estimated from the gee model and the personal trends (estimated by a linear model) for each person (in gray) to see how the trajectory of gene expression from SP-AKCenter varies between people. In the epidermis only the HLA-DPB1 gene had a significant expression trend.
Code
# remove data in enviroment (this keeps memory low)#rm(list=ls())######################################### Load the analysis data. ######################################## Load in the analysis1 data analysis1 and make sure subject ID is a factor. analysis <-readRDS("GeneAnalysisData.RDS")setDT(analysis)#dim(analysis) # should be 870 rows. #Check #analysis[1:10,1:10] #table(analysis1$SubjectID) #table(analysis1$Region,analysis1$RegionOrdinal)################################################################### Create gene list##################################################################genes<-c("CD274","CD28","CD80","CD86","CIITA","CTLA4","HAVCR2","HLA-DMA","HLA-DOA","HLA-DOB","HLA-DPB1","HLA-DRB1","LAG3","LGALS1","LGALS3","LGALS4","PDCD1","PDCD1LG2", "PVR", "TIGIT", "TNFRSF18", "TNFRSF9", "TNFSF18", "TNFSF9" )#genes[!(genes %in% names(analysis))]#################################################### Make dataset from wide to long#################################################cols <-c("SubjectIDMatched", "RegionOrdinal","AOIInfo", genes)analysis_Long <-melt.data.table(analysis[,.SD, .SDcols = cols],id.vars =c("SubjectIDMatched", "RegionOrdinal","AOIInfo"),variable.name ="Gene",value.name ="Exp")############################################# Get Epi Results############################################ filter only for only epi samples, order by subjectIDEpi <- analysis_Long[AOIInfo=="Epi"][order(SubjectIDMatched)] # Order by subjectID# Get derm resultsEpiResults <-trendMod(Epi)# Display resultsEpiResults[order(pvalue),.(Gene,Trend=round(Estimate,3), pvalue=round(pvalue,3))] %>%kbl(caption ="Epi results for SP-AKCenter trend") %>%kable_classic() %>%scroll_box(width ="100%", height ="400px")
Epi results for SP-AKCenter trend
Gene
Trend
pvalue
HLA-DPB1
0.133
0.004
CD80
-0.068
0.248
CIITA
0.046
0.291
HLA-DOB
-0.050
0.305
PDCD1LG2
0.055
0.310
HLA-DRB1
0.063
0.352
CD274
-0.056
0.409
LGALS3
-0.073
0.419
CTLA4
0.045
0.447
TNFSF9
-0.025
0.573
LGALS1
0.038
0.580
HLA-DOA
-0.024
0.636
PVR
-0.016
0.659
HAVCR2
0.020
0.677
TIGIT
0.021
0.682
CD86
0.020
0.704
TNFSF18
0.015
0.776
CD28
-0.012
0.818
PDCD1
-0.014
0.819
LAG3
0.014
0.819
TNFRSF9
0.014
0.819
LGALS4
0.004
0.938
TNFRSF18
-0.003
0.954
HLA-DMA
0.001
0.983
Code
############################################# Get Derm Results############################################ filter only for only Derm samples, order by subjectIDDerm <- analysis_Long[AOIInfo=="Derm"][order(SubjectIDMatched)] # Order by subjectID# Get derm resultsDermResults <-trendMod(Derm)DermResults[order(pvalue),.(Gene,Trend=round(Estimate,3), pvalue=round(pvalue,3))] %>%kbl(caption ="Derm results for SP-AKCenter trend") %>%kable_classic() %>%scroll_box(width ="100%", height ="400px")
Derm results for SP-AKCenter trend
Gene
Trend
pvalue
TNFSF18
-0.087
0.053
HAVCR2
0.061
0.066
TNFSF9
-0.066
0.076
CTLA4
0.102
0.088
TNFRSF18
0.047
0.149
HLA-DRB1
0.150
0.187
HLA-DMA
0.077
0.204
CIITA
0.058
0.209
HLA-DPB1
0.099
0.321
TIGIT
0.047
0.340
PVR
0.022
0.466
CD274
0.028
0.510
HLA-DOB
0.021
0.622
CD80
-0.015
0.679
LGALS4
-0.022
0.685
LGALS1
0.034
0.724
TNFRSF9
0.019
0.725
CD28
-0.014
0.733
LGALS3
-0.018
0.785
PDCD1
-0.014
0.797
HLA-DOA
0.010
0.811
LAG3
0.011
0.813
PDCD1LG2
0.005
0.923
CD86
0.006
0.927
Code
# Epi genes with significant trendsigEpi <-unlist(EpiResults[pvalue<0.05,.(Gene)])plots(Epi[Gene %in% sigEpi],EpiResults[Gene%in%sigEpi])
Code
# Save resultswrite.csv(EpiResults[,.(Gene,Trend=round(Estimate,3),pvalue=round(pvalue,3))], file ="EpiTrendSelectGenes.csv")write.csv(DermResults[,.(Gene,Trend=round(Estimate,3),pvalue =round(pvalue,3))],"DermTrendSelectGenes.csv")
Appendix
Means across ROI
For selected genes, we will be plotting the mean and 95% confidence intervals across ROI’s. We will use generalized estimating equations to estimate the mean log2 CPM expression in each ROI. The generalized estimating approach allows us to account for the within person correlation. We use a large sample approximation to estimate the 95% confidence intervals. The selected genes we are looking at are:
# Read in analysis dataanalysis <-readRDS("GeneAnalysisData.RDS")setDT(analysis) # turn data into a data table object# Make factor region to make sure SP-AKCenter are in the right order. analysis[,Region:=factor(Region,levels =c("SunProtected","SunDamaged","AKEdge","AKCenter","cSCC_Stroma","cSCC_Edge","cSCC_Center","cSCC_Tumor"))]################################################################### Create gene list##################################################################genes<-c("CD274","CD28","CD80","CD86","CIITA","CTLA4","HAVCR2","HLA-DMA","HLA-DOA","HLA-DOB","HLA-DPB1","HLA-DRB1","LAG3","LGALS1","LGALS3","LGALS4","PDCD1","PDCD1LG2", "PVR", "TIGIT", "TNFRSF18", "TNFRSF9", "TNFSF18", "TNFSF9" )############################################################### Make analysis data long so the genes are in the rows#############################################################cols <-c("SubjectIDMatched","AOIInfo","Region",genes)analysisLong <-melt.data.table(analysis[,.SD,.SDcols = cols],id.vars =c("SubjectIDMatched","AOIInfo","Region"),variable.name ="genes",value.name ="exp")################################################################# Get Epi and Derm means and se's for each region.### ############################################################### filter only for only epi samples, order by subjectID and drop the unused levels#. for region (the cSCC samples).Epi <- analysisLong[AOIInfo=="Epi"# Filter on Epi ][order(SubjectIDMatched) # Order by subjectID ][,Region:=droplevels(Region)]# get means for each gene in EpiepiMeans <-meansFun(Epi) # Drop unused region levels# filter only for only Derm samples, order by subjectID and drop the unused levels#. for region (the cSCC samples).Derm <- analysisLong[AOIInfo=="Derm"# Filter on Epi ][order(SubjectIDMatched) # Order by subjectID ][,Region:=droplevels(Region)]dermMeans <-meansFun(Derm) # Drop unused region levels############################################################# Rearrange results#### Here we are stacking the means and se's for each region ## On top of each other to make it easier to plot.###########################################################epiRes <-rbind(epiMeans[,.(genes,mean=SP,se=SP.se,Region="SP")], epiMeans[,.(genes,mean=SD,se=SD.se,Region="SD")], epiMeans[,.(genes,mean=AKEdge,se=AKEdge.se,Region="AKEdge")], epiMeans[,.(genes,mean=AKCenter,se=AKCenter.se,Region="AKCenter")])# Factor region to make sure region is in correct order. epiRes[,Region:=factor(Region,levels =c("SP","SD","AKEdge","AKCenter"))]dermRes <-rbind(dermMeans[,.(genes,mean=SP,se=SP.se,Region="SP")], dermMeans[,.(genes,mean=SD,se=SD.se,Region="SD")], dermMeans[,.(genes,mean=AKEdge,se=AKEdge.se,Region="AKEdge")], dermMeans[,.(genes,mean=AKCenter,se=AKCenter.se,Region="AKCenter")])# Factor region to make sure region is in correct order. dermRes[,Region:=factor(Region,levels =c("SP","SD","AKEdge","AKCenter"))]################################################################################ Plot Epi results. I am breaking this up into seperate batches to help with readability## each batch will have 12 genes. ############################################################################### Genes 1:12genes12 <- genes[1:12]ggplot(epiRes[genes %in% genes12],aes(x=as.numeric(Region), y=mean)) +geom_point() +geom_line() +geom_errorbar(aes(ymax = mean+1.96*se, ymin = mean-1.96*se), width=.25) +facet_wrap(~genes, scales ="free_y") +theme_bw() +scale_x_continuous(breaks =c(1,2,3,4),labels =c("SP","SD","AK-Edge","AK-Center")) +ylab("log2(CPM)") +xlab("Region") +ggtitle("Means and 95% confidence intervals accross regions","Epi set 1") +labs(caption ="Means were estimated using GEE \n to account for correlation from repeat sampling") +theme(axis.text.x =element_text(angle =90))
Code
#ggsave("Results/Means/EpiSelectedGenesMeansBatch1.png")genes24 <- genes[13:24]ggplot(epiRes[genes %in% genes24],aes(x=as.numeric(Region), y=mean)) +geom_point() +geom_line() +geom_errorbar(aes(ymax = mean+1.96*se, ymin = mean-1.96*se), width=.25) +facet_wrap(~genes, scales ="free_y") +theme_bw() +scale_x_continuous(breaks =c(1,2,3,4),labels =c("SP","SD","AK-Edge","AK-Center")) +ylab("log2(CPM)") +xlab("Region") +ggtitle("Means and 95% confidence intervals accross regions","Epi set2") +labs(caption ="Means were estimated using GEE \n to account for correlation from repeat sampling") +theme(axis.text.x =element_text(angle =90))
Code
#ggsave("Results/Means/EpiSelectedGenesMeansBatch2.png")########################################## Plot derm results######################################### Genes 1:12genes12 <- genes[1:12]ggplot(dermRes[genes %in% genes12],aes(x=as.numeric(Region), y=mean)) +geom_point() +geom_line() +geom_errorbar(aes(ymax = mean+1.96*se, ymin = mean-1.96*se), width=.25) +facet_wrap(~genes, scales ="free_y") +theme_bw() +scale_x_continuous(breaks =c(1,2,3,4),labels =c("SP","SD","AK-Edge","AK-Center")) +ylab("log2(CPM)") +xlab("Region") +ggtitle("Means and 95% confidence intervals accross regions","Derm set 1") +labs(caption ="Means were estimated using GEE \n to account for correlation from repeat sampling") +theme(axis.text.x =element_text(angle =90))
Code
#ggsave("Results/Means/DermSelectedGenesMeansBatch1.png")genes24 <- genes[13:24]ggplot(dermRes[genes %in% genes24],aes(x=as.numeric(Region), y=mean)) +geom_point() +geom_line() +geom_errorbar(aes(ymax = mean+1.96*se, ymin = mean-1.96*se), width=.25) +facet_wrap(~genes, scales ="free_y") +theme_bw() +scale_x_continuous(breaks =c(1,2,3,4),labels =c("SP","SD","AK-Edge","AK-Center")) +ylab("log2(CPM)") +xlab("Region") +ggtitle("Means and 95% confidence intervals accross regions","Derm set 2") +labs(caption ="Means were estimated using GEE \n to account for correlation from repeat sampling") +theme(axis.text.x =element_text(angle =90))
The goal of this analysis is to find differentlially expressed genes in
Epi Sun-Damaged vs cSCC (edge, center and tumor combined)
Epi AK-Edge vs cSCC (edge, center and tumor combined)
Epi AK-Center vs cSCC (edge, center and tumor combined)
Derm Sun-Damaged vs stromal cSCC
Derm AK-Edge vs stromal cSCC
Derm AK-Center vs stromal cSCC
For this we model SD, AK-edge and AK-center separately to break up the correlation between the regions. The cSCC samples are from different people than the SD-AK-Center samples and contain the regions cSCC-Edge, cSCC-Center, cSCC-Stroma and cSCC-Tumor. For this analysis we are comparing the Epi SD and AK samples to the to the cSCC-tumor, cSCC-Edge and cSCC-Center samples combined as one cSCC tissue type. The derm SD and AK samples are are compared to the cSCC stroma samples. We use GEE with an exchangeable correlation structure to account for the within person correlation between samples. For each gene, we collect the estimated log2 CPM fold change and the pvalue. HLA . In the epidermis samples genes HLA-DOB, CD86, HLA-DOA, PDCD1, and TNFSF9 were down regulated in cSCC compared to SD, AKEdge and AK Center. While IFNGR1 was up regulated in cSCC compared to all other regions. In the dermis samples CD27 was up regulated in cSCC stroma compared to all other dermis regions (SK through AKCenter), while TNFSF9 was down regulated compared to all other regions.
We plot the results in an interative heatmap that shows the significant fold changes for each gene and each comparison.
Code
# remove data in enviroment (this keeps memory low)#rm(list=ls())######################################### Step 1: load analysis data######################################## Load in the analysis1 data analysis1 and make sure subject ID is a factor. analysis <-readRDS("GeneAnalysisData.RDS")setDT(analysis)#Check #analysis2[1:10,1:10] #table(analysis2$SubjectID) ############################################ Step 2:## Collapse the cSCC regions into one cSCC region. Also add a second region variable## that has the AKCenter and AK-Edge collapsed. ##########################################analysis2 <- analysis[,Region2 :=factor(ifelse(grepl("cSCC",Region), "cSCC",Region),levels =c("SunProtected","SunDamaged","AKEdge","AKCenter","cSCC"))]# Check#table(analysis2$Region, analysis2$Region2)#table(analysis2$AOIInfo,analysis2$Region2)################################################## Put analysis data into long format.##################################################get genes for analysisgenes <-c("CD274","CD28","CD80","CD86","CIITA","CTLA4","HAVCR2","HLA-DMA","HLA-DOA","HLA-DOB","HLA-DPB1","HLA-DRB1","LAG3","LGALS1","LGALS3","LGALS4","PDCD1","PDCD1LG2", "PVR", "TIGIT", "TNFRSF18", "TNFRSF9", "TNFSF18", "TNFSF9","TNFRSF14", "CD70", "PBK", "TP53RK","TPRKB", "CD27", "TLR4", "STAT1", "IFNG", "IFNGR1", "IFNGR2", "MYD88", "HMGB1")# Get columns needed for analysiscols <-c("SubjectIDMatched", "AOIInfo","Region","Region2", genes)analysisLong <-melt.data.table(analysis2[,.SD, .SDcols = cols],id.vars =c("SubjectIDMatched","AOIInfo","Region","Region2"),variable.name ="Gene",value.name ="Exp")#table(analysisLong$Gene)################################################################### Create gene list##################################################################genes<-c("CD274","CD28","CD80","CD86","CIITA","CTLA4","HAVCR2","HLA-DMA","HLA-DOA","HLA-DOB","HLA-DPB1","HLA-DRB1","LAG3","LGALS1","LGALS3","LGALS4","PDCD1","PDCD1LG2", "PVR", "TIGIT", "TNFRSF18", "TNFRSF9", "TNFSF18", "TNFSF9" )#################################################################### Run the following gee models#### Epi Sun-Damaged vs cSCC (edge, center and tumor combined) #### Epi AK-Edge vs cSCC (edge, center and tumor combined) #### Epi AK-Center vs cSCC (edge, center and tumor combined) ###### Derm Sun-Damaged vs stromal cSCC #### Derm AK-Edge vs stromal cSCC #### Derm AK-Center vs stromal cSCC ################################################################### Plot results. This function takes the results from above and plots the genes## with a significant fold change. #plotSigResults <- function(res, title=""){# Filter only significant results# sig = res[pvalue<0.05]# wrap_plots(sig$plot) +# plot_annotation(title=title)#}# Get regions for Epi cSCCEpicSCC <-c("cSCC_Center","cSCC_Edge","cSCC_Tumor")# For each analysis we filter on the region of interest, drop the unused levels# of the Region2 variable and order the observations by SubjectID (for GEEPack)# EPI SD Vs. cSCCEpiSDVscSCC<-resFun(analysisLong[(Region %in% EpicSCC) | (Region=="SunDamaged"& AOIInfo=="Epi") ][,Region2:=droplevels(Region2) # Drop unused levels ][order(SubjectIDMatched)]) # order by SubjectID#plotSigResults(EpiSDVscSCC)# EPI AKEdge vs cSCCEpiAKEdgeVscSCC<-resFun(analysisLong[(Region %in% EpicSCC) | (Region=="AKEdge"& AOIInfo=="Epi") ][,Region2:=droplevels(Region2) # Drop unused levels ][order(SubjectIDMatched)])#plotSigResults(EpiAKEdgeVscSCC)# EPI AK Center vs cSCCEpiAKCenterVScSCC <-resFun(analysisLong[(Region %in% EpicSCC) | (Region=="AKEdge"& AOIInfo=="Epi") ][,Region2:=droplevels(Region2) # Drop unused levels ][order(SubjectIDMatched)])#plotSigResults(EpiAKCenterVScSCC)# Derm SD vrs stromaDermSDVsStroma <-resFun(analysisLong[(Region =="cSCC_Stroma") | (Region=="SunDamaged"& AOIInfo=="Derm") ][,Region2:=droplevels(Region2) # Drop unused levels ][order(SubjectIDMatched)]) #plotSigResults(DermSDVsStroma)# Derm AK Edge vrs stromaDermAKEdgeVsStroma <-resFun(analysisLong[(Region =="cSCC_Stroma") | (Region=="AKEdge"& AOIInfo=="Derm") ][,Region2:=droplevels(Region2) # Drop unused levels ][order(SubjectIDMatched)]) #plotSigResults(DermAKEdgeVsStroma)# Derm AK Edge vrs stromaDermAKCenterVsStroma <-resFun(analysisLong[(Region =="cSCC_Stroma") | (Region=="AKCenter"& AOIInfo=="Derm") ][,Region2:=droplevels(Region2) # Drop unused levels ][order(SubjectIDMatched)]) #plotSigResults(DermAKCenterVsStroma)################################################################################ Plot results in a heatmap##############################################################################combindedResults <-rbind(EpiSDVscSCC[,Comp :="cSCC Vs Epi SD"], EpiAKEdgeVscSCC[,Comp :="cSCC Vs Epi AKEdge"], EpiAKCenterVScSCC[,Comp :="cSCC Vs Epi AKCenter"], DermSDVsStroma[,Comp :="cSCC Stroma Vs Derm SD"], DermAKEdgeVsStroma[,Comp :="cSCC Stroma Vs Derm AKEdge"], DermAKCenterVsStroma[,Comp :="cSCC Stroma Vs Derm AKCenter"])## Make sure the comparisons are in the right order and set to NA any fold change## that was not significant. Also add a text variable that will allow use to hovercombindedResults[,`:=`(# Update comparisoncomparison =factor(Comp,levels =c("cSCC Vs Epi SD","cSCC Vs Epi AKEdge","cSCC Vs Epi AKCenter","cSCC Stroma Vs Derm SD","cSCC Stroma Vs Derm AKEdge","cSCC Stroma Vs Derm AKCenter")),sigFoldChange =ifelse(pvalue<0.05,Log2FoldChange,NA),# Add text variable this is used for the interactive heatmap. text =paste0("Gene: ",Gene, "\n","Comparison: ",Comp, "\n","Fold Change: ",round(Log2FoldChange,3),"\n","P-value: ",round(pvalue,3)))]################################################################################ Give quick summary of results################################################################################ Give table with resultscombindedResults[,AOI:=ifelse(grepl("Epi",comparison),"Epi","Derm")]# Summarizes for each gene the number of tests that were significant with a max of 3. resTable <- combindedResults[,.(Up=sum(pvalue<0.05&sigFoldChange>0),Down =sum(pvalue<0.05&sigFoldChange<0)),by=.(Gene,AOI)]tab <-cbind(resTable[AOI=="Epi",-2], resTable[AOI=="Derm",-c(1:2)])[ ,sums:=rowSums(.SD), .SDcols=is.numeric][order(-sums)][,sums:=NULL]tab%>%kbl(caption="Number of significant comparsons for each gene") %>%add_header_above(c("","Epi"=2,"Derm"=2)) %>%kable_classic() %>%scroll_box(width ="100%", height ="400px")
Number of significant comparsons for each gene
Epi
Derm
Gene
Up
Down
Up
Down
HLA-DOA
0
3
0
0
PDCD1
0
3
0
0
CIITA
0
0
2
0
HLA-DMA
0
1
1
0
TNFRSF18
0
0
2
0
TNFSF9
0
0
0
2
PBK
0
0
0
2
MYD88
2
0
0
0
CD80
0
1
0
0
CD86
0
1
0
0
HLA-DPB1
0
0
1
0
HLA-DRB1
0
0
1
0
LGALS3
0
0
0
1
LGALS4
0
0
0
1
PDCD1LG2
0
0
0
1
PVR
0
0
0
1
TNFSF18
0
1
0
0
CD70
0
0
0
1
CD27
0
0
1
0
STAT1
0
0
1
0
IFNG
0
0
0
1
IFNGR1
0
0
1
0
HMGB1
0
0
1
0
CD274
0
0
0
0
CD28
0
0
0
0
CTLA4
0
0
0
0
HAVCR2
0
0
0
0
HLA-DOB
0
0
0
0
LAG3
0
0
0
0
LGALS1
0
0
0
0
TIGIT
0
0
0
0
TNFRSF9
0
0
0
0
TNFRSF14
0
0
0
0
TP53RK
0
0
0
0
TPRKB
0
0
0
0
TLR4
0
0
0
0
IFNGR2
0
0
0
0
Code
############################################################################### Create the heatmap of estimates and pvalues. #### Genes as the rows and the tests will be the columns. The values are filled## with the fold change estimate only if the the fold change was significant. ##############################################################################heatPlot <-ggplot(combindedResults, aes(x=comparison, y=Gene,text= text)) +geom_tile(aes(fill=sigFoldChange)) +scale_fill_gradient(low ="blue", high ="red") +theme_bw() +theme(axis.text.x =element_text(angle =15, vjust = .5)) +ggtitle("Significant fold changes") +xlab("Comparison")# Show ggplot interactively. ggplotly(heatPlot, tooltip ="text", width =800, height =800)
Code
################################################################################## Save results################################################################################colsToKeep <-c("Gene","Log2FoldChange","pvalue","comparison","AOI")#write.csv(combindedResults[,.SD,.SDcols = colsToKeep], file = "Results/DifferentialExpression/SelectedGenesSDThroughAKVscSCC.csv")